Self-Driving Car Engineer Nanodegree

Deep Learning

Project: Build a Traffic Sign Recognition Classifier

The goals / steps of this project are the following:


Step 0: Import all the libraries and load the data


Step 1: Dataset Summary & Exploration

The pickled data is a dictionary with 4 key/value pairs:

Provide a Basic Summary of the Data Set Using Python, Numpy and/or Pandas

Include an exploratory visualization of the dataset

Visualize the German Traffic Signs Dataset using the pickled file(s). This is open ended, suggestions include: plotting traffic sign images, plotting the count of each sign, etc.

The Matplotlib examples and gallery pages are a great resource for doing visualizations in Python.

NOTE: It's recommended you start with something simple first. If you wish to do more, come back to it after you've completed the rest of the sections. It can be interesting to look at the distribution of classes in the training, validation and test set. Is the distribution the same? Are there more examples of some classes than others?


Step 2: Design and Test a Model Architecture

Design and implement a deep learning model that learns to recognize traffic signs. Train and test your model on the German Traffic Sign Dataset.

1. Image Data Preprocssing

1. Balancing data, converting to grayscale and normalization

The precessing is dividied into 3 step. Details are described in the following sections.

  1. Balancing data
  2. Converting to grayscale
  3. Pre-processing for each image (stage 2)

1. Balancing data

The training set is not well balanced for different classes. Therefore, duplicate the training samples for each class randomly so all classes have the same numbers of samples. Different sets of duplicates are computed for each epoch of the training. After the copying, many classes would have the same sample. To overcome this problem, we would perform image augmentation for each set.

2. Converting to grayscale

Convert to grayscale and make the data between 0 and 1. They are used for initial data.

3. Normalization

Normalize the result so that the 0 is mapped to the minimum value and 255 is mapped to the maximum value for each image. They are used right before the training or evaluation.

2. Image augmentation

Image Augmentation using different ways:

  1. Random lighting by adjusting gamma
  2. Random Rotation
  3. Random Scale
  4. Random Translation

Image Augmentation would be performed after balancing the data. They would be performed for each epoch of the training.

1. Random lighting by adjusting gamma

2. Random Rotation

3. Random Scale

4. Random Translation

5. Combining all methods

Preprocess all the images using stage 1 first

Model Architecture

My model is defined in the following code.

Train, Validate and Test the Model

A validation set can be used to assess how well the model is performing. A low accuracy on the training and validation sets imply underfitting. A high accuracy on the training set but low accuracy on the validation set implies overfitting.

Traing the model.

Plot the results.

Analyze performance in more detail

Calculate the precision and recall for each traffic sign type from the test set. Display some of the wrong prediction images.

Precision and recall for each traffic sign type from the training set

Precision and recall for each traffic sign type from the validation set

Precision and recall for each traffic sign type from the test set


Step 3: Test the Model on New Images

Test the model on 10 pictures of German traffic signs either found from the web or taken by myself.

Load and Output the Images

Load the images from the folder external_image and display them.

Predict the Sign Type for Each Image

Predict the new 10 images. Each prediction is shown on the title of the image.

Analyze Performance

Compute the accuracy of the new 10 images.

Output Top 5 Softmax Probabilities For Each Image Found on the Web

For each of the new images, print out the model's softmax probabilities to show the certainty of the model's predictions (limit the output to the top 5 probabilities for each image) and plot the corresponding bar charts.


Step 4: Visualize the Neural Network's State with Test Images

This Section is not required to complete but acts as an additional excersise for understaning the output of a neural network's weights. While neural networks can be a great learning device they are often referred to as a black box. We can understand what the weights of a neural network look like better by plotting their feature maps. After successfully training your neural network you can see what it's feature maps look like by plotting the output of the network's weight layers in response to a test stimuli image. From these plotted feature maps, it's possible to see what characteristics of an image the network finds interesting. For a sign, maybe the inner network feature maps react with high activation to the sign's boundary outline or to the contrast in the sign's painted symbol.

Provided for you below is the function code that allows you to get the visualization output of any tensorflow weight layer you want. The inputs to the function should be a stimuli image, one used during training or a new one you provided, and then the tensorflow variable name that represents the layer's state during the training process, for instance if you wanted to see what the LeNet lab's feature maps looked like for it's second convolutional layer you could enter conv2 as the tf_activation variable.

For an example of what feature map outputs look like, check out NVIDIA's results in their paper End-to-End Deep Learning for Self-Driving Cars in the section Visualization of internal CNN State. NVIDIA was able to show that their network's inner weights had high activations to road boundary lines by comparing feature maps from an image with a clear path to one without. Try experimenting with a similar test to show that your trained network's weights are looking for interesting features, whether it's looking at differences in feature maps from images with or without a sign, or even what feature maps look like in a trained network vs a completely untrained one on the same sign image.

Combined Image

Your output should look something like this (above)